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1. Introduction. Incomplete LU (ILU) matrix decompositions are 
used both as preconditioners to conjugate gradient (CG) methods, as well 
as smoothers in multigrid methods (Raw 1994, Stevenson 1994), in solving 
the linear systems that arise during the numerical solution of PDEs. A 
number of studies have examined the effect of matrix elimination order on 
the quality of ILU preconditioners (D’Azevedo, Forsyth and Tang 1992a, 
D’Azevedo, Forsyth and Tang 1992b, D’Azevedo, Forsyth and Tang 1992c, 
Duff and Meurant 1989, Dutto 1992, Notay 1993, Doi 1991). In (Duff and 
Meurant 1989, D’Azevedo et al. 1992a, D’Azevedo et al. 1992b, D’Azevedo 
et al. 1992c) evidence was presented to demonstrate how matrix ordering 
can have a profound effect on that quality when solving anisotropic PDEs 
with preconditioned CG (PCG) methods. 

This paper investigates an expression of the matrix ordering heuris- 
tic described in (D’Azevedo et al. 1992b, Margenov and Vassilevski 1994) 
which draws information from the PDE from which the linear system arises, 
and the grid over which it is discretized. If the factorization is 

A = LU -I- E 


where L and U are the lower and upper factors retained, and E the factors 
discarded, then we want to minimize some measure of E (such as the matrix 
norm). The heuristic is, in short, that the matrix rows should be ordered 
to follow anisotropy in the grid in the direction of weak connections (or 
close grid spacing), which, in a fundamental case, will be shown to produce 
low magnitude entries in the discarded fill. An explicit implementation 
of this heuristic using combinatorial algorithms was investigated in (Clift 
and Tang 1995). Reducing the bandwidth of the matrix is also known to 
improve the quality of an ILU factorization (Duff and Meurant 1989, Dutto 
1992). Increased fill overlap in the narrow bandwidth means the fraction 
of the retained fill is higher relative to the complete factorization with such 
orderings. 

One expression of these heuristics can be found in the two-sum of a 
matrix, which is minimized to find an ordering. Let matrix A be of size 
n x n, with components a,*j. We define the p-sum (or p- discrepancy), <r p (A) 
as 


(i.i) 


xl>) 


1 1 Ip 


a >j 1^(0 - V , (i)l p 

1 < i,j<n 

an? 0 


where and * V — ► {1 . . . \ V\} is an ordering, or mapping of the n vertices 
V of the associated graph (or the rows of the matrix). 

If an ordering follows the strongly connected nodes, and reduces the 
bandwidth, <r p (A) should be low, or minimized. The strong weights will 
then tend to be near the diagonal of the matrix in the mapping and have a 
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correspondingly small | ^(i) - ^{j)\ value, and the small weights will offset 
the larger distances from the diagonal. Thus we need to compute the 
problem for some matrix B that that corresponds to A but with inverted 
weights. 

The two-sum, cr 2 (£?), will be used as a measure of the weighted band- 
width of the matrix. We will be concerned with a minimum 2-sum problem 
associated with matrix B. We wish to find an ordering, a mapping such 
that 


<t 2 (.B) = mincr 2 (H, i />) . 

xp 

The fundamental technique we employ to minimize is that of (Hall 
1970), which involves computing the smallest eigenvector of that associated 
matrix. The existence of this spectral method dictates the choice of p = 2. 

The goal of this study was to evaluate the applicability of the basic 
ordering heuristics to various cases using a more mathematically sound ba- 
sis than had been previously attempted. The ordering methods presented 
here are computationally expensive, and thus not particularly practical 
for day-to-day use. They do, however, indicate where less expensive ex- 
pressions of the heuristics (such as those described in (Pothen, Simon and 
Wang 1992, Clift and Tang 1995)) can expect success, and point the way 
to their sound and appropriate application. 

In the sections immediately following we outline a number of the basic 
concepts used in our research. However, the reader may wish to review 
(Wallis 1983, Langtangen 1989, D’Azevedo et al. 1992b) for an outline of 
level based, incomplete, factorization (denoted ILU(/), where / is the level 
of fill retained in the preconditioner). We will be referring to matrices as 
weighted graphs, where the matrix rows represent vertices, and the graph 
edges are encoded in the off- diagonals, the magnitude of the off-diagonal 
coefficients providing the “strength” of the connections. The reader may 
wish to review (Parter 1961, Rose 1972, George and Liu 1981, D’Azevedo 
et al. 1992a) for relevant information on this view of matrices. The concepts 
connecting cr 2 , and the eigenvalues of an associated matrix are to be found 
in (Hall 1970, Fiedler 1975, Juvan and Mohar 1992, Juvan and Mohar 1993, 
Mohar and Poljak 1994), the latter being a particularly good survey paper 
on eigenvalues in combinatorics. These concepts have recently been used 
in graph decomposition (Pothen et al. 1992, Peyton, Pothen and Yuan 
1992, Hendrickson and Leland 1993, Chan, Schlag and Zien 1994), and 
matrix band-width reduction (Barnard, Pothen and Simon 1993, George 
and Pothen 1994). 

After justifying the heuristic for ordering two dimensional linear diffu- 
sion problems, we will link it with the mincr 2 problem. This paper presents 
experimental results and analysis for a set of test cases solved with a PCG 
method using an ILU preconditioner constructed with the resulting order- 
ing. The exposition will be repeated for three dimensional linear diffusion 
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problems, with important differences noted. In conclusion we will summa- 
rize guidelines about the validity and applicability of the heuristics. 

2. Lower bound of fill entries in decomposition. The theoretical 
justification for the heuristic, noted in the introduction, of following weak 
connections in the graph requires exploring the connection between the 
fill in an LU decomposition and the weighted graph representation of the 
matrix. The following provides an estimate of the lower bound of the fill 
which supports our heuristic. 

Let Ga ~ {Oa,£a) be the digraph represented by matrix A 

G a — {°1 , . . . , O n } , £a = {( 0 * ) 0 j ) I a Oi {pj ) = CLij ^ 0} 

Denote S be a subset of O and node u £ S. Node u is said to be reachable 
from a node v through S if there exists a path (v, 0 \ , 02 , . . . , o*, ii) in graph 
Ga , such that each o, € S, 1 < i < k (George and Liu 1981). Similarly, 
node u is said to be contributive to a node v through S if there exists a path 
(u, 0 \ y 02 , - . - , 0 }.,v) in graph Ga , such that each o t - £ «S, 1 < i < k. If A is 
self-adjoint, then the statement of U u is reachable from u” is equivalent to 
“u is contributive to t/\ 

The reachable set and contributive set of node o t through S is denoted 
by 

1Z = Reach(oi , S) = {u | u is reachable from o t * through S } 

C = Contributive(oi } S) = {u | u is contributive to o,- through S} 

For a self-adjoint matrix, 71 = C. If S — {oi, •• • , is the set of all 
nodes which are eliminated before o, , then 7Z is the set of nodes oj for 
which Uij ^ 0 and C is that of Iji ^ 0. Notice j > i. Let X = S U o t * and 
J — S U oj. Let A[l, J] be the submatrix extracted from the rows in X 
and columns in J , and A{X) — A[X,I] is the principal submatrix of A in 
the rows and columns defined by J. It is well known that Uj and Uji can 
be expressed in terms of determinants: 

_ det A[1,J] _ det A[J,X] 

ij ~ detA[Z,X] ’ Uij ~ det A[Z,Z\ 

Denote A p (v — ► u) to be the product of elements on the path p from v 
to t£, namely, 


Ap{y * u) — a v,oi a Gl ,02 * ' * u<?jk ,ti • 

Denote V(p , S) to be the subset of the nodes of S not belonging to path p 
and A{y{p> S)) be the principal submatrix of A in the rows and columns 
defined by V(p, 5), i.e., the nodes not on the path p. We restate a result 
from (Maybee, Olesky, van den Driessche and Wiener 1989) (Corollary 8.2) 
here: 
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Lemma 2.1. 

m 

(2.1) det A[L,J] = -*j) det A(V(p t ,S)) 

Jb = l 
m 

(2.2) det A[J,1\ = £(-l)'M Pk (;^t)det4(V>*,,S)) 

* = 1 

where the sum is over all distinct paths from 0 { to Oj in I U oj ; Ik is the 
length of path pk from to oj. 

We can have the following result directly from this lemma: 

Lemma 2.2. Let S he the subset of all nodes which are eliminated 
before node o, in the LU decomposition process and A = LU be the LU 
decomposition of A. Then 

Uij ^ 0 only if Oj is reachable through S from Oj, 

Iji ^ 0 only if o,* is contributive to Oj through S . 


If A is an Af-matrix, each term in the summations of (2.1) and (2.2) 
is positive. We then have the following lower bound: 

Theorem 2.1. Let matrix A be an M -matrix, A = LU and S = 
( 01 , 02 , • • - ,Oi_i). If oj is reachable from o t - through S, then 

(2.3) | u ,. i |> max Ji£ii£L 7 ^ > o 

Pk A Pk (X) 

where pk — (o,*, o/ 15 o* 2 , . . . , o\ k) Oj), o\ { € S is a path from node 0 { to oj and 

A Pk( T) - °i,i a h,h ■ ■ ' “Ik ,1k- 

If Oi is contributive to Oj through S , then 

(2.4) Kjt 1 > max — p ^ 0j rr °* }l - > 0 

Pk A Pk ( J ) 

where pk = ( o, , oi 1 , o/ 3 , . . . , o/ k , Oi), o ^ € S is a path from node o, to o,- and 
a p*(J) = a jJ a h,h ■ ■■ a h,h ■ 

Proof The proofs of (2.3) and (2.4 ) are similar, only (2.3) is shown. 
Let 1C be the complementary set of V(pk, S) in set S U o,-. It was shown in 
(Engel and Schneider 1977) that if A is a non-singular Af-matrix, then 

(2.5) 0 < det A(l) < det A(V(p h ,S)) det A{K). 

Notice, all principal minors of an Af-matrix are also Af-matrices. From 

(2.5) , a direct generalization can be obtained: 


( 2 . 6 ) 


detvl(J) = det j 4[J,2] < ana 22 * * *«»* 
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Since each term in (2.1) is positive, we have 

i, , _ det A[1,J] 
m ~ det A(l) 

| (» -» j)| det A(V(p k ,S)) 
det A(l) 

> \ApJj -> jJIdet^Vfrfc.S)) 

deti4[V(p*,5)]deti4(/C) 

> j ')1 

- A Pk (l) 

where p* is any path from o t - to Oj . □ 

If a particular ordering would allow a path on which large elements 
(weights) are located, large magnitude fill elements will result. This sup- 
ports our heuristics and our observations in testing. 

3. Two dimensional grid ordering heuristic. Consider the linear 
diffusion problem 

(3.1) V-(KVU) = -q 

in two dimensions over a square region 0 < x, y < 1 where K = K(x i y) is 
a position dependant coefficient vector, and q is the source term. We used 
the usual five-point finite difference discretization of this equation over a 
uniform, Cartesian grid, (a harmonic average is used to deal with the cases 
where K is discontinuous). This produces a symmetric nxn linear system 

Ax = b 


with between three and five entries per matrix row, and the source terms 
expressed in the right hand side. The linear system will have positive off- 
diagonals that correspond to the local values of K , which can be considered 
the connection strengths between nodes of the problem. Sources q(x ) y) are 
represented by multiplying the diagonal, and source strength in the right 
hand side of the linear system, by a sufficiently large value, thus retaining 
matrix symmetry. 

Our prototypical anisotropic problem is the case where K x = 1000 and 
K y = 1. The matrix which results is a symmetric M-matrix. If we scale the 
matrix so that the diagonal entries are unity, all the edges aligned along 
the z-axis will have a connection strength of 0(K x /K x + 1) and the edges 
aligned with the y-axis will have a connection strength of 0(l/K x ). If the 
natural x — y ordering is used, then all new fill entries will be oriented more 
in the x-direction (see figure 9.1), and by the lower bound given in theorem 
2.1, the fill entries will have a slow decay rate. If natural y — x ordering is 
used, the fill entries will have a more rapid decay rate. Thus, the value of 
the fill using the y — x ordering will have less of a bearing on the quality 
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of the preconditioner as the level of fill increases than the fill using x — y 
ordering. Natural y — x ordering was shown to produce a profoundly more 
effective preconditioner for the CG method in (D’Azevedo et al. 1992b). 

As we noted in the introduction, the quantity we minimize to imple- 
ment this heuristic is not the two-sum of A, but rather of a related matrix 
B. We shall use B , with elements bij such that 


(3.2) 



sjj where a tj ^ 0 
0 where a tJ = 0 


Our case of constant K coefficients over the entire region permits us to 
demonstrate the following. 

Lemma 3.1. The two-sum of B under natural ordering is a local min- 
imum with respect to any single permutation of the matrix labelling. 

Proof The proof is trivial. 

If we consider cases where N x « N y then if K x > K y , and hence 
1 /K x < 1 /K y then 

<t 2 {B, x-y ordering) = -- (N x — 1) N y + -^-NyN% 

JVX Ay 

(t 2 (B, y-x ordering) = -^-(N y - 1) N x + -£-N x N* 

Ay J\ X 

and we see that of the two local minima associated with natural orderings, 
the one dictated by the above heuristic, the y — x ordering, corresponds 
to that with the lowest a 2 - However, if N x >- N y then a natural ordering 
that reduces bandwidth can be expected to produce the lowest two-sum 
because N x Ny <C N y N%. This will be seen in the experimental results 
over a long thin problem, even with K x = 1000/\ y (see the LONGTHIN 
problem below). 

4. Spectral Ordering Algorithm. We now present the spectral or- 
dering algorithm, which will be justified in the next section. Given matrix 
A we define symmetric matrix C with entries 1 

I l/o.j I 

C,3 ~{ 0 i = j 

and diagonal matrix D with entries 

du = ^ Cij . 

j 


1 Another definition for C proposed during our research was to set c tJ = 
1/ exp(|ajj|) V aij ^ 0. This, however, produced matrix entries that varied by too 
many orders of magnitude, making eigenvalues impossible to compute for many test 
cases, and producing no favourable effects in general. 
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We can define the weighted pseudo-laplacian of B as 

(4.1) Bt = C-D 

which is a symmetric, positive semi-definite matrix of rank (n — 1) (Hall 
1970). Using Bi as the matrix related to A, with connection strengths 
inverted, the spectral ordering algorithm proceeds as in figure 9.2. 

5. Minimizing a 2 • The spectral ordering algorithm is justified by the 
proof originally presented by (Hall 1970), and presented in the unweighted 
context by (Barnard et al. 1993). Detailed analysis of this sort of algorithm 
is presented in (George and Pothen 1994). The following demonstrates 
that computing the second weighted pseudo-Laplacian eigenvector solves a 
continuous relaxation of the discrete problem of minimizing the two-sum 
of a matrix. 

For odd (even) n, let ^ denote the set of orderings xj? as defined in 
Equation 1.1. We shall evaluate the mapped row positions xp(i) as integers 
such that 1 < xj>(i) < n. Thus 

mm^g*^#/) = ^mi'rtyg# ^ . 

c*j 9*^0 

Consider a continuous vector x E with elements z t . We can define a 
permutation induced by x by the rule that pi < pj if and only if 

x i < Xj . This ordering is unique except where two or more components of 
x are equal. 

We now minimize the two-sum over the class of unit n - vectors xG-f 
satisfying x ^ 0, x*x = 1, x t u = 0 where u* = (1, 1, . . . , 1). This relaxes the 
condition that the reordering vector must belong to a set of permutation 
vectors, leaving the continuous optimization problem: 


-min X £x ^ Cij(xi - Xj ) 2 
" Ci^O 


(5.1) 

(5.2) 


/ n 

min xeX I ^ daxj - ^ CijXiX, 

\c; = l dj? 0 

(x t Dx — x*Cx) 
min X £xx t Bx 
\2E\E2 = ^2 


Noting that the first eigenvalue of B is zero, and its corresponding 
eigenvector trivial, the second weighted pseudo-Laplacian eigenvector E 2 
solves the continuous approximation to the two-sum problem. The proof 
that the resulting permutation rp x induced by E 2 is the discrete vector that 
is closest to minimizing <72 is to be found in the papers cited above. 

As a demonstration we compute the second smallest eigenvector of 
the discretized 2D anisotropic problem defined by equation 3.1. With the 
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coefficients K x = 1000, and K y = 1 the eigenvector for the problem over a 
3x3 grid is 

(5.3) E 2 = {1,1,1, 0,0, 0,-1, -1,-1} 

and we see that the natural ordering is permitted under the procedure for 
inducing the ordering tp. For the problem over a 30 x 30 grid the eigenvalue 
is plotted as a surface in figure 9.3. Again, natural ordering is, modulo the 
tie-breaking within rows, the result. The eigenvector exhibits the same 
anisotropy as the matrix. We observe this in other problems as well, such 
as the AN ISO problem below. 

At this point we assert, admittedly on less than rigorous grounds, that 
the spectral ordering algorithm is a reasonable embodyment of the ordering 
heuristics, and that these heuristics are a reasonable approach to ordering 
two-dimensional problems for ILU factorization. 

6. Experimental results for 2D problems. For comparison to the 
spectrally based orderings, we will also compute the Reverse Cuthill-McKee 
(RCM) (George and Liu 1981), Minimum Update Matrix (MUM), and Min- 
imum Discarded Fill (MDF) orderings (D’Azevedo et al. 1992a, D’Azevedo 
et al. 1992b, D’Azevedo et al. 1992c). RCM has been shown to be a good 
bandwidth reducing ordering (Duff and Meurant 1989, Dutto 1992), and 
works only on the sparsity pattern of the matrix. MUM and MDF are sensi- 
tive to the matrix coefficients, and have been shown to produce highly effec- 
tive orderings in a number of situations (although MUM is better for prob- 
lems with larger computational molecules) (Clift and Forsyth 1994, Chin, 
D’Azevedo, Forsyth and Tang 1992, D’Azevedo et al. 1992b, Notay 1993). 
The MUM and MDF orderings are, relative to RCM, quite expensive to 
compute. 

Table 9.1 shows the effect of the various orderings on a set of 2D prob- 
lems over a regular grid. Appendix A gives the details of how each problem 
is specified. Each problem has a different layout, or degree of anisotropy. 
The problems were solved using the conjugate gradient method, with a 
level 1 ILU preconditioner (denoted ILU(l) ), i.e. the first level of fill was 
kept. ILU(0) was found to be relatively insensitive to ordering, and since 
the heuristic only really addresses the first level of fill, higher levels would 
be expected, and were found, to benefit less from this ordering strategy. (In 
addition, higher levels of fill are not practical for many real applications). 
The solutions were converged to a residual of 10" 12 relative to the residual 
of the initial zero vector guess. 

The work required is measured as the number of iterations required 
during the solution phase, multiplied by the total number of non-zero en- 
tries in the matrix and the preconditioner. We justify this work measure- 
ment by the fact that the matrix- vector multiply and preconditioner- vector 
solve are the two variable components in the cost of the PCG algorithm, 
and the most expensive operations. The computation of the preconditioner 
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is generally a small cost relative the solution iteration. The first column 
(RCM ordering) of Table 9.1 shows the absolute measure of work, and the 
remaining columns are shown as fractions of the work required to solve the 
problem relative to RCM ordering. 

6.1. Spectral ordering. Spectral ordering performed generally as 
expected. For the isotropic problem, LAPD5, bandwidth reduction is 
the only heuristic applicable. The BIG1DIR problem, with its uniform 
anisotropy, was ordered quite well. The ANISO problem, with abutting 
anisotropic regions of differing directions, was also ordered properly. With 
one exception, the spectral ordering algorithm performed well whenever 
the MDF algorithm did well. 

The STONE and STONEROT90 problems have exactly the same lay- 
out, except for a rotation of 90 degrees. Figures 9.4 and 9.5 show the eigen- 
vectors of the two problems. Note that in both, the region of K x = K y = 0 
appears as a flat spot on the surface plot. The STONEROT90 case eigen- 
vector is such that this appears where it will be ordered roughly between 
the nodes surrounding it, whereas in STONE the region will not be ordered 
so as to keep it with its neighboring nodes. This shows a weakness of the 
spectral ordering algorithm in dealing with such blocks. 

The LONGTHIN problem has a direction of anisotropy that would 
cause the weak-direction following heuristic to order in the direction that 
maximizes the bandwidth. Figure 9.6 shows the sparsity pattern for spec- 
tral ordering, indicating that the bandwidth reduction tendencies of the 
ordering dominate the other heuristic. 

7. Three dimensional grid ordering heuristic. In three dimen- 
sions the ordering heuristics becomes less clear cut. As with many other 
topics related to topology (such as triangulations, hierarchichal bases, etc.), 
the weak connection heuristic for the plane graph can not be generalized 
to the three-dimensional grid. The bandwidth reduction hypothesis still 
holds, but following the weak connections in the graph can fail to improve 
the ILU factorization. 

Consider the incomplete factorization process for equation 3.1 in three 
dimensions, discretized over a square grid with constant coefficients k y 
and k z . During the elimination process, we will follow the x direction first, 
then the y direction. Figure 9.7 shows this schematically, with the shaded 
nodes being eliminated in the order shown by the arrow. By theorem 2.1 
the magnitude of the three first level fill elements created when node 1 is 
eliminated (shown in dashed lines) will be 

Level 1 Fill j k z k y 

y hyh% 

The second level of fill generated when node 2, and, later, the first node 
in the y direction, node 4, are eliminated will have four elements with 
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magnitudes of order 


Level 2 Fill < 


For an ILU(l) factorization, these second level fill elements are dis- 
carded. Figure 9.2 shows the magnitude of the sum of the fill discarded 
given different combinations of weighting coefficients. Note that when there 
are two weak, and one strong direction, there is an ordering that discards 
substantially less fill (« 10 3 versus « 10 6 in magnitude). If there are two 
strong directions, then there is no ordering that discards substantially less 
fill than the others. 

This is borne out experimentally. Table 9.3 shows the number of PCG 
iterations required to solve three problems with homogeneous anisotropy 
tested over the six natural orderings (the fill patterns were identical, hence 
measuring using solver work as defined in section 6 was not required). The 
problem with one strong, and two weak connections shows that an order- 
ing with the two weak connections first produces a good preconditioner, 
and thus fewer iterations were required to get a solution. The problem 
with two strong weights had no good ordering at all. The problem with a 
weak, medium, and strong and weight showed that although ordering the 
directions from those of weakest to strongest connections produced a good 
ordering. 

Also of note is the measure of the two-sum of the inverse-weighted 
Laplacian, shown in table 9.4 in entries corresponding to table 9.3. A 
low two-sum was consistently associated with a good ordering, when one 
existed. Good orderings did not necessarily have a low two-sum. 

8. Experimented results for 3D problems. As in section 6, Table 
9.5 shows the effect of the various orderings on a set of 3D problems over 
a regular grid. These problems were solved using the same method and 
tolerances as the 2D problems. The table is also displayed in terms of 
solver work, with RCM given as an absolute value and other orderings 
shown as fractions of that work. 

Spectral ordering did not perform as well as hoped. The BIG1DIR3F 
case looks good because the RCM algorithm accidentally created a very 
poor ordering (RCM is coefficient insensitive). Only in the weak, weak, 
strong case (BIG1DIR3G) were the uniform anisotropy cases effectively 
ordered. All other results were poor, indicating that the heuristic embodied 
in spectral ordering has a limited application. 

An interesting case in point is the ANIS03D problem. Figure 9.8 
visualizes three planes through the eigenvector values field. Smaller icons 
represent lower values. Note that the division between the eight quadrants 
of the problem, particularly between the front and back (as shown) are 


k z k x k x 
ky k x k x 
k z ky ky 
2 (^k z ky k x 
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quite visibly expressed in the eigenvector. However, Figure 9.9 shows that 
the sharp distinctions are lost in the ordering, (as visualized by the same 
technique). We note that opposite corners on the back plane would be 
ordered together, which runs counter to the heuristics. This failure to 
capture distinct regions suggests why the spectral ordering technique fared 
poorly, even though the anisotropic blocks have one strong, and two weak 
coefficient directions. 

It is interesting to note that MDF, which uses the more aggressive 
heuristic of evaluating discarded fill directly to determine successive nodes 
in an ordering, was able to produce more consistent results. The BIG1DIR3H 
problem, which has two strong and one weak direction was slightly im- 
proved over RCM ordering. 

9. Conclusions. In this work a more rigorous basis for matrix or- 
dering heuristics was set in the context of two-dimensional, anisotropic 
diffusion problems discretized over a regular grid. The minimization of the 
two-sum of the weighted pseudo-Laplacian of the matrix has been shown 
to be a good expression of the weak connection following heuristic, and the 
bandwidth reduction heuristic. Using a well-known spectral algorithm to 
minimize this two-sum, the resulting orderings have been experimentally 
tested for their effectiveness for both two and three dimensional problems. 

This study shows that the weak connection following heuristic is most 
effective over the problems from which it was derived: two dimensional, 
regular grid problems with fairly simple patterns of anisotropy. We have 
demonstrated that there are three dimensional cases where following the 
weak connections in the grid must fail to improve the ILU factorization. 
These cases, where more than one of the grid directions has strong connec- 
tivity, appear to be reasonably ordering insensitive. Two dimensional prob- 
lems, with minimal connectivity (i.e. the 5 point finite difference stencil) 
seem to be the most sensitive to the ordering used for ILU factorizations. 
MDF ordering produced improvements in cases demonstrated to be insen^ 
sitive to the direction heuristic, indicating that more aggressive ordering 
techniques can still produce savings. 

In (Clift and Tang 1995) an inexpensive approximation to the weak 
connection following ordering heuristic was developed and tested. However, 
that study did not reveal or predict the reasons why the methods succeeded 
or failed. Rapid methods for ILU ordering still remain to be developed 
for anisotropic convection-diffusion problems where convection dominates. 
This is the object of a current research project. 

Acknowledgements. The authors would like to thank Dr. Alex 
Pothen for his insights into direct matrix solver orderings using spectral 
methods, and his suggestion that we investigate spectral methods for in- 
complete LU decompositions, which lead to this study. 



SPECTRAL ORDERING METHODS FOR ILU PRECONDITIONERS 13 


REFERENCES 


Barnard, S. T., Pothen, A. and Simon, H. D.: 1993, A spectral algorithm for envelope 
reduction of sparse matrices, Tech. Rep. CS-93-49, University of Waterloo. 

Chan, P. K., Schlag, M. D. F. and Zien, J. Y.: 1994, Spectral A;- way ratio-cut partitioning 
and clustering. Computer Engineering, U. California, Santa Cruz. 

Chin, P., D’Azevedo, E. F., Forsyth, P. A. and Tang, W.-P.: 1992, Preconditioned 
conjugate gradient methods for the incompressible Navier-Stokes equations, 
International Journal for Numerical Methods in Fluids 15, 273-295. 

Clift, S. S. and Forsyth, P. A.: 1994, Lienar and non-linear iterative methods for the 
incompressible Navier-Stokes equations, International Journal for Numerical 
Methods in Fluids 18, 229-256. 

Clift, S. S. and Tang, W.-P.: 1995, Weighted graph based ordering techniques for pre- 
conditioned conjugate gradient methods, BIT : Nordisk Tijdskrift for Infor- 
mationbehandling 35, 30-47. 

D’Azevedo, E. F., Forsyth, P. A. and Tang, W. P.: 1992a, Ordering methods for precon- 
ditioned conjugate gradient methods applied to unstructured grid problems, 
SIAM J. Matrix Anal. Appl. 13(3), 944-961. 

D’Azevedo, E. F., Forsyth, P. A. and Tang, W.-P.: 1992b, Towards a cost-effective ILU 
preconditioner with high level fill, BIT 32, 442-463. 

D’Azevedo, E. F., Forsyth, P. A. and Tang, W.-P.: 1992c, Two variants of minimum dis- 
carded fill ordering, in R. Beauwens and P. de Groen (eds) , Proc. IMA CS In- 
tern. Symp. on Iterative Methods in Linear Algebra , North- Holland, pp. 603- 
612. 

Doi, S.: 1991, A Gust afsson- type modification for parallel ordered incomplete factor- 
izations, Technical report , C&C Information Technology Research Labs, NEC 
Corp.,. 

Duff, I. S. and Meurant, G. A.: 1989, The effect of ordering on preconditioned conjugate 
gradients, BIT: Nordisk Tijdskrift for Inf ormationbehand ling 29, 635-657. 

Dutto, L. C.: 1992, The effect of ordering on preconditioned GMRES algorithm, for 
solving the compressible Navier-Stokes equations, Research report , Universite 
de Montreal, CRM. to appear in International Journal for Numerical Methods 
in Engineering. 

Engel, G. and Schneider, H.: 1977, The Hadamard- Fischer inequality for a class of matri- 
ces defined by eigenvalue monotonicity, Linear Multilinear Algebra 4, 155-176. 

Fiedler, M.: 1975, A property of eigenvectors of nonnegative symmetric matrices and 
its application to graph theory, Czech. Math. Journal 25(100), 619-633. 

George, A. and Liu, J. W. H.: 1981, Computer Solutions of large sparse positive- definite 
systems, , Prentice- Hall, Englewood Cliffs, New Jersey. 

George, A. and Pothen, A.: 1994, Analysis of a spectral envelope reduction algorithm 
via quadratic assignment problems. Draft Preprint. 

Hall, K. M.: 1970, An r-dimensional quadratic placement algorithm, Management Sci- 
ence — Theory Series 17(3), 219-229. 

Hendrickson, B. and Leland, R.: 1993, A multilevel algorithm for partitioning graphs, 
Technical Report SAND93-1301 , Sandia National Laboratories. 

Juvan, M. and Mohar, B.: 1992, Optimal linear labelings and eigenvalues of graphs, 
Discrete Applied Mathematics 36, 153-168. 

Juvan, M. and Mohar, B.: 1993, Laplace eigenvalues and bandwidth-type invariants of 
graphs, Journal of Graph Theory 17(3), 393-407. 

Langtangen, H. P.: 1989, Conjugate gradient methods and ILU preconditioning of 
non-symmetric matrix systems with arbitrary sparsity patterns, International 
Journal for Numerical Methods in Fluids 9, 213-233. 

Margenov, S. D. and Vassilevski, P. S.: 1994, Algebraic multilevel preconditioning of 
anisotropic elliptic problems, SIAM J. Set. Comput. 15(5), 0126-1037. 

Maybee, J., Olesky, D., van den Driessche and Wiener, G.: 1989, Matrices, digraphs 
and determinants, SIAM J. Matrix Anal. Appl. 10, 500-519. 



14 


SIMON S. CLIFT, HORST D. SIMON AND WEI-PAI TANG 


Mohar, B. and Poljak, S.: 1994, Eigenvalues in combinatorial optimization. Preprint. 
Notay, Y.: 1993, Ordering methods for approximate factorization preconditioning, Tech- 
nical report , Service de Metrologie Nucleaire, Universite Libre de Bruxelles. 
Parter, S. V.: 1961, The use of linear graphs in gaussian elimination, SIAM Review 
3, 364-369. 

Peyton, B. W., Pothen, A. and Yuan, X.: 1992, Partitioning a chordal graph into 
transitive subgraphs for parallel sparse triangular solution, Technical Report 
CS-92-55 } University of Waterloo. Submitted to Linear Algebra and its Ap- 
plications. 

Pothen, A., Simon, H. D. and Wang, L.: 1992, Spectral nested dissection, Technical 
Report 92-01 , Computer Science Dept., Pennsylvania State University. 

Raw, M.: 1994, Coupled algebraic multigrid for the solution of the discretized 3D Navier- 
Stokes equations, in J. J. Gottlieb and C. R. Ethier (eds), Proceedings of CFD 
94 , CFD Society of Canada, pp. 335-344. 

Rose, D.: 1972, A graph- theoretic study of the numerical solution of sparse positive 
definite systems of linear equations, in R. C. Read (ed.), Graph Theory and 
Computing , Academic Press, pp. 183-217. 

Stevenson, R.: 1994, Modified ILU as a smoother, Numerische Mathematik 67, 295-309. 
Wallis, J. R.: 1983, Incomplete gaussian elimination as a preconditioning for generalized 
conjugate gradient acceleration, Proceedings of the 1983 SPE Symposium on 
Reservoir Simulation in San Francisco. SPE 12265. 

A. Problem Definitions. Tables A.l and A. 2 define the problems 
used in this work by listing the grid size, block sizes and associated coef- 
ficient values, and the “background” coefficient values that cover the rest 
of the field. Source terms not given since these do not substantially affect 
the ordering processes used in this work. 
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Fig. 9.1. The fill resulting from elimination with natural x — y and y — x ordering. 

1. Compute B\. 

2. Compute the second lowest eigenvalue A 2 of B\ and the 

ASSOCIATED EIGENVECTOR E 2 . 

3. Construct pointer array order such that order(i) = i 
for all i = 1 . . .n. 

4. Sort the order array such that {^(order^')) | i = 1 . ..n} is 

IN ASCENDING ORDER. 


Fig. 9.2. Spectral ordering algorithm . 



FlG. 9.3. Second weighted pseudo-Laplacian eigenvector for a simple anisotropic prob- 
lem . 
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30 


Fig. 9.4. Second, weighted pseudo-Laplacian 


eigenvector for the STONE problem. 



Fig. 9.5. Second weighted pseudo-Laplacian eigenvector for the STONEROT90 prob- 
lem. 
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FlG. 9,8. Eigenvalue field for ANISOSD. Smaller icons denote smaller values for the 
eigenvector at that point . 
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FIG. 9.9. Ordering vector for ANISOSD. Smaller icons denote earlier elements in the 
ordering. 
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Problem 

Ordering Method 

RCM 

MDF 

MUM 

Spectral 

ANISO 

3.97e+05 

0.70 

1.26 

0.63 

BIG1DIR 

3.86eH-05 

0.41 

0.95 

0.47 

ANISOCENT 

9.36e+05 

0.95 

1.06 

1.00 

EXTREMEANI 

9.74e+05 

0.69 

1.19 

0.81 

LAPD5 

3.76e+05 

0.93 

1.00 

1.03 

LONGTHIN 

2.27e+04 

1.12 

1.00 

1.03 

STONE 

4.24e+05 

0.85 

1.14 

1.33 

STONEROT90 

4.24e+05 

0.85 

1.16 

0.90 

VDVORST 

4.13e-f 05 

0.33 

1.00 

0.53 


Table 9.1 

Work required to perform 2D regular grid problem iteration at ILU(l). The RCM 
column shows actual work required. The other columns show work as a fraction relative 
to the RCM column. 


Element Weights in Each 
Ordering Direction 
First Second Third 

^(discarded fill elements) 

i 

1 

1000 

« 4 x 10 3 

i 

1000 

1 

as 1 x 10 6 

1000 

1 

1 

as 2 x 10 6 

1 

1000 

1000 

as 2 x 10 6 

1000 

1 

1000 

as 3 x 10 6 

1000 

1000 

1 

as 4 x 10 s 


Table 9.2 

Sum of the magnitude of fill discarded in an ILU(l) factorization of a set of regular 
grid problems with the given weights in each ordering direction. Grid size 30 X 30 X 30 


K X) K y ,K z Ordering directions (first, second, third) 

Weights z,i ijZ x^y y ± x 1 z y,z, x z^x^y z,y,g 


(1,100,1000) 

145 


145 

146 

260 


(1000,1,1) 

139 

139 



121 


(1000,1000,1) 



298 

299 


299 


Table 9.3 

Solver iterations required when natural orderings are applied to uniformly anisotropic 
3D problems. Grid size 30 X 30 X 30, solved with ILU(l) preconditioner. 
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K t ,K y ,K, 

Weights 


Ordering directions (first, second, third) 
x,z } y y ) x ) z y,z,x z : x,y 

z,y,x 

(1,100,1000) 

6542 

20565 

9448 

205626 

21675 

205627 

(1000,1,1) 

205740 

205740 

205626 

9451 

205626 

9451 

(1000,1000,1) 

205626 

9448 

205626 

9448 

6510 

6510 


Table 9.4 

Inverse-weight Laplacian two-sum for natural orderings applied to uniformly anisotropic 
3D problems. Grid size 30 x 30 x 30. 


Problem 

Ordering Method 

RCM 

MDF 

MUM 

Spectral 

ANIS03D 

4.82e+07 

0.82 

1.26 

1.54 

ANIS03E 

5.19e+07 

0.89 

0.98 

1.02 

ANIS03F 

5.70e+07 

0.87 

1.48 

1.59 

BIG1DIR3D 

7.52e+07 

0.97 

1.73 

1.02 

BIG1DIR3E 

7.52e+07 

0.97 

1.73 

1.42 

BIG1DIR3F 

1.35e+08 

0.55 

0.96 

0.57 

BIG1D1R3G 

7.21e+07 

0.53 

0.98 

0.44 

BIG1D1R3H 

1.56e+08 

0.88 

0.98 

1.00 

LAP7D 

3.53e+07 

1.01 

0.99 

1.00 

STONE3D 

6.42e+07 

0.85 

0.94 

0.91 

STONE3E 

6.70e+07 

0.87 

0.98 

1.02 

STONE3F 

4.58e+07 

0.90 

1.20 

1.05 


Table 9.5 

Work required to perform the solution iteration on 3D regular grid problems at ILU(l). 
The RCM column shows actual work required. The other columns show work as a 
fraction relative to the RCM column. 
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Problem 

Grid 

Block Range 

K x 

K v 

ANISO 

30x30 

(1, 1)— (15,15) . 

100 

1 



(16,16) — *-(30,30) 

100 

1 



Background 

1 

100 

BIG 1 DIR 

30 x 30 

Everywhere 

EH 

i 

ANISOCENT 

40 x 40 

(11,11) — >-(20,20) 

MM 

100 



(11,21) — ►(21,30) 


1 



Background 

m 

1 

EXTREMEANI 

40 x 40 

(10,31)— (40,40) 

Mi 

1000 



(10,11)— (40,30) 


1 



Background 

■ i 

1 

LAPD5 

30 x 30 

Everywhere 

i 

1 

LONGTHIN 

200 x io 

Everywhere 

1000 

1 

STONE 

31 x 31 

(15, 1)— (31,17) 

1 

100 



(6, 6)— (13,13) 

100 

1 



(13,22)— (20,29) 

0 

0 



Background 

1 

1 

STONEROT90 

31 x 3 1" 

(1,15)— (17,31) 

1 

100 



(6, 6)— (13,13) 

100 

1 



(22,13)— (29,20) 

0 

0 



Background 

1 

1 

VDVORST 

41 x 41 

(11,30)— (11,30) 

100 

0.1 



Background 

1 

0.0001 


Table A.l 

Definitions for 2D regular grid problems used in this work 
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Problem 

Grid 

Block Range 

K x 

K y 

K, 

ANIS03D 

30 x 30 x 30 

( 1, 1, 1)— (15,15,15) 

100 

1 

1 



(16,16, 1)— *-(30,30,16) 

100 

1 

1 



(16, 1,16)— *-(30,15,30) 

1 

100 

1 



( 1,16,16) — *-(15,30,30) 

1 

100 

1 



Background 

1 

1 

100 

ANIS03E 

30 x 30 x 30 

( 0, 0, 0)— (15,15,15) 

1 

100 

100 



(16,16, 0)— *(30,30,16) 

1 

100 

100 



(16, 1,16)— *(30,15,30) 

100 

1 

100 



( 1,16,16) — *(15,30,30) 

100 

1 

100 



Background 

100 

100 

1 

ANIS03F 

30 x 30 x 30 

( 0, 0, 0)— *(15,15,15) 

1 

100 

1000 



(16,16, 0) — *(30,30,16) 

1 

100 

1000 



(16, 1,16)— *(30,15,30) 

1000 

100 

1 



( 1,16,16) — *(15,30,30) 

1000 

100 

1 



Background 

100 

1000 

1 

BIG1DIR3D 

30 x 30 x 30 

Everywhere 

1 

100 

1000 

BIG1DIR3E 

30 x 30 x 30 

Everywhere 

100 

1 

1000 

BIG1D1R3F 

30 x 30 x 30 

Everywhere 

1000 

100 

1 

BIG1DIR3G 

30 x 30 x 30 

Everywhere 

1000 

1 

1 

BIG1DIR3H 

30 x 30 x 30 

Everywhere 

1000 

1000 

1 

LAP7D 

30 x 30 x 30 

Everywhere 

1 

1 

1 

STONE3D 

31 x 31 x 31 

(15, 1, 1)— *(31,17,17) 

1 

100 

100 



( 6, 6, 6) — *-(13,13,13) 

100 

1 

1 



(13,22,22)— >(20,29,29) 

0 

0 

0 



Background 

1 

1 

1 

STONE3E 

31 x 31 x 31 

(15, 1, 1)— *(31,17,17) 

1 

100 

100 



( 6, 6, 6)— >(13,13,13) 

100 

100 

1 



(13,22,22) — *(20,29,29) 

0 

0 

0 



Background 

1 

1 

1 

STONE3E 

31 x 31 x 31 

(15, 1, 1)— *(31,17,17) 

1 

1 

100 



( 6, 6, 6)— *(13,13,13) 

100 

1 

1 



(13,22,22)— >(20,29,29) 

0 

0 

0 



Background 

1 

1 

1 


Table A. 2 

Definitions for 3D regular grid problems used in this work 
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